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Abstract 

The purpose of this study was to compare the characteristics of 
unidimensional ability estimates obtained from data generated from the 
multidimensional IRT (MIRT) compensatory and noncompensatory models. Reckase, 
Carlson, Ackerman and Spray (1986) reported that when the compensatory model 
is used and item difficulty is confounded with dimensionality, the composition 
of the unidimensional ability estimates differs for different points along the 
unidimensional ability scale. Eight data sets (four compensatory, four 
noncompensatory) were generated for four different levels of correlated two 
dimensional abilities: p = 0, .3, .6, .9. In each set difficulty was 
confounded with dimensionality. Each set was then calibrated using the IRT 
calibration programs LOGIST and BILOG. BILOG calibration of response vectors 
generated to the matched MIRT item parameters appeared to be more affected 
than LOGIST by the confounding of difficulty and dimensionalicy. As the 
correlation between the generated two-dimensional abilities increased, the 
response data appeared to become more unidimensional as evidenced in bivariate 
plots of 9j vs. 62 for specified 6 quantiles. 
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A Comparison Study of the Unidimensional IRT 
Estimation of Compensatory and Noncompensatory 
Multidimensional Item Response Data 

One of the underlying assumptions of unidimensional item response theory 
(IRT) models is that a person's ability can be estimated in a unidimensional 
latent space. However, researchers and educators have expressed concern 
whether or not the response process to any one item requires only a single 
latent ability. Traub (1983) suggests that many cognitive variables are 
brought to the testing task and that the number used varies from person to 
person. Likewise, the combination of latent abilities required by individuals 
to obtain a correct response may vary from item to item. Caution over the 
application of unidimensional IRT estimation of multidimensional response data 
has been expressed by several researchers including Ansley and Forsyth (1985); 
Reckase, Carlson, Ackerman, and Spray (1986); and. Yen (1984). 

Using a compensatory multidimensional IRT (MIRT) model, Reckase et al. 
(1986) demonstrated that when dimensionality and difficulty are confounded 
(i.e., easy items discriminate only on 9^, , difficult items discriminate only 
on 62) the unidimensional ability scale has a different meaning at different 
points on the scale. Specifically, for their two- dimensional generated data 
set, upper ability deciles differed mainly on 82 while the lower deciles 
differed mostly on 9^. These results led the authors to suggest that the 
univariate calibration of two-dimensional response data can be explained in 
terms of the interaction between the multidimensional test information and the 
distribution of the two-dimensional abilities. Reckase et al. (1986) examined 
the condition in which ability estimates were uncorrelated. Such an approach 
may not be very realistic, however, since most cognitive abilities tend to be 
correlated. 
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Model Definition 

A compensatory model, M2PL, Reckase (1985) was used for specification of 
compensatory items. The model defines the probability of a correct response 
as: 

P(x.. = L|a., d., e.) = L 

1 . exp [ - l^^a .^B d .] 

where x^j is the response to item i by person j, 

9_.^ is the ability parameter for person j on dimension k, 

a^j^ is the discrimination parameter for item i on dimension k, 

d^ is the difficulty parameter for item i. 

The probability of a correct response for the noncompensatory model 
proposed by Sympson (1978) is: 



1 - c. 

P. . (X. = lie. ) = c. + 1- 

J ' m J n 



n (1 + exp [-1.7a. fe. - b. II) 
n=l jn ^ m jn^ ^ ' 

where bj^ is the difficulty of item j in dimension n. For this study, cj, the 

guessing parameter, ^-^as set to zero. 



Method 



To test the effects of correlated ability dimensions, four levels of 
correlation were selected p = .0, .3, .6, and .9). 
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Parameters for a set of 40 two-dimensional compensatory items were 
selected with difficulty and dimensionality confounded. Discrimination 
parameters ranged from a^ = 1.8, aj = .2 to a^ = .2, aj = 1.8. Difficulty was 
confounded with dimensionality such chat the difficulty parameters ranged from 
d = -2.4 (for a^ - 1.8, a.^ = .2) to d = 2.4 (for a^ = .2 and a^ = 1.8). Thus 
as the items became more difficult, they discriminated less 
along 82 and more along Bj. The guessing parameter was set to zero because 
there was concern over how much "noise" would be added to the multidimensional 
data with a nonzero guessing parameter. 

An item vector plot (See Reckase, 1985) representing the distance and 
direction from the origin to the point of maximum slope (discrimination) is 
shown in Figure 1. The longer a vector is in the third quadrant the easier 
the item, and the longer a vector in the first quadrant, the more difficult 
the item. 

Corresponding noncompensatory items (same probability of a correct 
response) were created using a least squares approach to minimize the 
difference. 



Insert Figure 1 about here 



I((P^,|e, a, d) - (P^^le a b))' 

where P^, is the compensatory model's probability of correct response; 
^NC '■^^ noncompensatory model's probability of correct 
response. 
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and 9 is a vector of two dimensional abilities generated from a 

bivariate normal distribution. 

Four noncompensatory item sets corresponding to the four levels of 
correlation among the two ability dimensions (p ^ = .0 .3, .6 and .9) were 
created. 

Item difficulties for each item (dj for the 40 compensatory items and bj^, 
b2 for the 40 noncompensatory items) for the ^ = 0.0 case are plotted in 
Figure 2a. It is interesting to compare the two sets. The selected dj values 
are positively related to the item number. In the noncompensatory items, b2 
is highest for item 1 and decreases steadily as the item number increases. 
Difficulties for dimension 1 do not vary greatly over the item set. 



Insert Figure 2a, b about here 



The discrimination parameters for both the compensatory and 
noncompensatory items for the ^ = 0.0 case are displayed for each item in 

^1^2 

Figure 2b. The aj^ parameters for each model are greatest in item 1 and 
decrease with icem number. The aj^ parameter is greater for the 
noncompensatory model for all items and decreases at a slower rate than its 
compensatory counterpart. The ^2 parameters for each model are lowest for the 
first item and greatest for the last item. The a2 parameters constantly 
increase with item number. 

To help understand how the probability of a correct response differs in 
each model, several item response surfaces (IRS) and corresponding contour 
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generated using the same (9^, Gj) combinations as produced the compensatory 
response data sets* 

Descriptive statistics were then obtained for each of the eight data 
sets. This was done to validate the similarities in item difficulty and to 
show the dimensionality of the data. These results are displayed in Table 1. 



Insert Table 1 about here 



The eight item response sets have the same mean difficulty, with the range of 
p values also similar. The mean biserials for compensatory and 
noncompensatory item sets appear to be more similar as the correlation between 
abilities increase. As the mean biserials increase, the KR-20 reliability 
coefficient also increase. Eigenvalues of the principal component analysis of the 
inter-item tetrachoric matrix were computed. Evidence of multidimensionality can be 
seen by forming a ratio of the first to the second eigenvalue, xJXj (See Hambleton 
& Murray, 1983). As the correlation between the abilities increases, the 
ratio increases suggesting more dominant first principal component and that 
Pfl o the data ^re almost unidimensional. 

Each dataset was then calibrated twice, once using LOGIST (Wingersky, 
Barton, & Lord, 1982), and again using BILOG (Mislevy & Bock, 1982). The two 
IRT calibration programs use different estimation procedures. LOGIST uses 
joint maximum likelihood estimation. The default method of scoring subjects 
was selected for all BILOG computer runs. The default method of scoring was 
expectation a posteriori using a normal N(0, 1) Bayesian prior. The default 
priors were also used in the item parameter calibration: a log-normal prior 
on the discrimination estimates and no prior on the difficulty estimates. 
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These data were then evaluated to determine the ef. ct of confounding 
difficulty with dimensionality for both the compensatory and noncompensatory 
item sets. In addition, the effects of correlation between ability dimensions 
was studied* 

Results 

To estimate the LOGIST and BILOG orientation in the two dimensional ability 
plane, the ability estimates from each calibration run were first rescaled to the 
compensatory ability estimates for the ^ = 0,0 case* The 6 for each calibration 
run were rank ordered and divided into twenty quantiles* The mean of 
the Sj and parameters for each quantile were then calculated and plotted. These 
CENTROID plots were then examined to see if there was any curvelinearity suggesting 
that the composite (6^, Sj) combination was not uniform across the univariate scale 
as predicted by the Reckase et al (1986) study. 

The centroid plots for the LOGIST calibration of the four compensatory 
and four noncompensatory data sets are shown in Figures 4a and b. The BILOG 
counterparts are presented in Figures 5a and b. 



Insert Figures 4a and b, 5a and 6 about here 



The LOGIST orientation appears to be similar for each level of 

correlation and for each type of MIRT model. The BILOG centroids are 

noticeably more variable. For the BILOG centroids, as p approaches zero, 

9^62 

the plot of the centroids increase in curvature. Thus, BILOG appears to be 
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more sensitive to the confounding of difficulty and dimensionality. When the 

ability correlation is ,9, the centroids for both calibration programs are 

almost linear. This is somewhat predictable because if the abilities are 

highly correlated, their response data would be expected to be unidimensional. 

The co'-relations between 0 (univariate estimate) and each of the two 

abilities (6^ and B^) and the mean absolute difference (HAD) between 6 and 

each of Sj and are shown in Tabl ^* Compared to the centroid plots, the 

dat<i are much more alike for compensatory and noncompensatory data sets and 

for LOGIST estimates compared to BILOG's estimates. It is interesting to note 

that Che univariate ability estimates correlate about equally with Gj and for 

all levels of ability correlation and for each model. The correlations 

between Gj and G, and 0, and G range from .59 (p^ ^ = 0) to .95 (p^ ^ = ^9). 

GiGj GjGj 

These results parallel the mean absoli*te differences: as the correlation between 
ability dimensions increase the HAD values d crease. Thus as the data become 
more unidimcnsional, the HAD and correlational values support that the j^'rograms 
both appear to align the univariate scale about equidistant from the ability 
axes . 



Insert Table 2 about here 



For the compensatory data sets, correlations and HAD values b^;tween a 
(univariate discrimination) and a^, a2> and b (univariate difficulty) and d are 
shown in Table 3. As the correlation between abilities increases, : he correlation 
between a and a^ and a and approach zero for both LOCIST and BILOG. HAD values 
between the discrimination estimates and parameters were slightly higher for BILOG 
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in all correlational conditions. For both programs, the correlation between b and d 
was .99 for all data sets. This would suggest very strongly that the pattern of 
difficulty between the individual iloms is recoverable to a high degree. 



Insert Table 3 about here 



Correlations and average MAD values between the discrimination and 
difficulty parameters and their estimates for the noncompensatory data sets 
are displayed in Table 4. The pattern of correlations between discrimination 
parameters end estimates is similar to that of the compenSc'Cory data. The 
correlations between b and bp are all .99, while the correlations 
between b and b2 range from r = .38 to r = .42 for both LOGIST and BILOG. 
This suggests that for the noncompensatory data there is a tendency to measure 
one dimension more strongly. This may dl.so be due to the restricted range of 
b2 values. 



Insert Table 4 about here 



In both the compensatory and noncompensatory data sets, the a's correlated 
positively with a^ and negatively with a2 except for the ^ = .9 case. 
Noticeable differences exist between the MAD valuer for the noncompensatory 
discrimination parameters and estimates for BILOG and LOGIST. For LOGIST the 
average absolute differences of both a^^ - a and a2 - a range from .80 to .86, 
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while the range is *32 to .38 for BILOG. For both calibration programs the 
correlations between a and a^ are negative except for the ^ = .9 case in 
which the pattern reverses. 

Conclusion 

Differences between the item response surfaces for each riiodel when the 
item parameters jre matched appear to be minimal and exist in places of 
the 9j, 02 plane where very few subjects would be expected to be found. Mean 
p-values for the eight sets were identical and the matches on biserial 
correlations were almost identical for the P^ « " case. Thus the least 

^1^2 

squares matching procedures appears to be an excellent method of matching the 
two MIRT models. 

The confounding of difficulty with dimensionality, which was reported in 
the results of the Reckase et al. (1986) study, was replicated, however, only 
for the BILOG calibration of response data in which p was closer to 0.0. 

^1^2 

The "wrap around*' effect of the 6^, centroids did not occur for any of the 
LOGIST estimation runs. Although it should be noted that in the Reckase study 
the items only measured 9^ or Whereas in this study each item measured a 

combination of 9^ and 92 wO varying degrees. Thus the confounding was not as 
great as in the Reckase et al. study. Another possible explanation may be the 
method of estimation. Perhaps the marginal maximum likelihood procedure of 
BILOG ii> more sensitive to the confounding of difficulty and dimensionality. 

The confounding of difficulty appeared to have the same affect on the 
ability parameter estimates for both the compensatory and noncompensatory 
datascts. Despite different test information patterns, as seen in the INFLINE 
plots, the orientation of the centroids appeared to be the same for each 
calibrations programs estimation of the two MIRT models. 
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The correla**ions among ability parameters and estimates suggest that as 

the relationship between the two ability din^nsions become more linear, the 

data in a sense become unidimensional. As p approached ,9, 9 correlated 

9j82 

in the mid .90' s with 9^ and 92. This was confirmed by the plots of 

the 9j, 82 centroids and the correlations of the discriminating parameters 

with their estimates. The correlations between a and a^ and a and a2 became 

closer as the correlation between 9^ and 92 increased. Likewise as 

the p Q approached ,9, the centroids appeared to align themselves along a 45° 

line. Both of these results suggest that 6^ and 62 were being ^..easured equally. 

. The plots of the 9j, 92 centroids for the 20 9 quantiles revealed 

differences between the two estimation programs. The centroid plot for LOGIST 

revc led only a slight confounding affect as p became closer to zero. 

However, 9^, 92 centroids for BILOG's 9 display a much sharper wrapping around 

about the negative 92 axis and the positive 9^ axis, especially 

when Pq Q =0, Thus it would appear that BILOG is more sensitive to the 
^1^2 

confounding of difficulty with dimensionality for both MiRT models. 

Several directions for future research are suggested by this study. One 
area for future research would be to systematically vary L^st information with 
different two dimensional ability distributions to determine how the 
interaction of the two affects the orientation of the univariate ability scale 
in th3 two-dimensional plane. Also, the differences between maximum 
likelihood and marginal maximum likelihood estimation of multidimensional 
response data needs to be further explored. 
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Table 1 

Descriptive statistics of the multidimensional data sets (N = 1000, i = 40) 







Eigenvalues 






Range 


of p 




Range 


of bis 


Raw 


Score 


Data 
Type 








KR-20 


P 


Lo 


Hi 


r 


Lo 


Hi 


X 


a 


COMP 


• 00 


9.24 


2.94 


.91 


.50 


.16 


.85 


.64 


.50 


.71 


20.15 


8.64 




• J 


10.84 


2.59 


.93 


.50 


.17 


.84 


.69 


.57 


.75 


20.18 


9.41 




• 0 


12.17 


2.27 


.94 


.50 


.18 


.84 


.73 


.59 


.79 


20.15 


10.04 




.9 


13.38 


2.00 


.95 


.50 


.18 


.83 


.76 


.61 


.82 


20.18 


10.61 


MCMP 


.00 


7.22 


3.17 


.88 


.50 


.16 


.84 


.56 


.47 


.64 


20.03 


7.64 




.3 


9.52 


2.69 


.92 


.50 


.17 


.83 


.65 


.57 


.72 


20.08 


8.84 




• 6 


11.64 


2.25 


.94 


.50 


.17 


.84 


.71 


.65 


.76 


20.13 


9.82 




.9 


13.53 


1.98 


.95 


.50 


.18 


.83 


.77 


.69 


.80 


20.00 


10.67 



Note: Eigenvalues are those of the first and second principal components of the 
inter-item tetrachoric correlation. 
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Table 2 

Correlations and mean absolute differences among 6, 9, and 6^ by levels of 
correlation for compensatory and noncompensatory data sets 



Calib 


Data 
Type 




r(e, Oj) 


r(e, Oj) 


k 


11^2 - ^ 
k 


LOGIST 


COMP 


.00 


.67 


.64 


.65 


.67 






.3 


.76 


.76 


.53 


.53 


• 




.6 


.85 


.85 


.42 


.42 






.9 


.94 


.94 


.26 


.27 


BILOG 




.00 


.68 


.64 


.63 


.65 






.3 


.78 


.76 


.53 


.54 






.6 


.87 


.86 


.43 


.44 






.9 


.95 


.95 


.28 


.28 


LOGIST 


NCMP 


.00 


.65 


.60 


.66 


.70 






.3 


.76 


.72 


.54 


.58 






.6 


.85 


.84 


.42 


.43 






.9 


.94 


.94 


.27 


.28 


BILOG 




.0 


.67 


.59 


.62 


.67 






.3 


.77 


.73 


.53 


.56 






.6 


.86 


.85 


.42 


.44 






.9 


.94 


.94 


.28 


.29 



t9 
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Table 3 

Correlations and mean absolute differences between LOGIST and BILOG estimates and 
parameters under the compensatory model 



Pkocxkaiti 




J* 

a, SL^ 


^3L, aj 


A 

"^b, d 


k 


k 


I|d-b 
k 


LOGIST 


0.0 


.30 


-.30 


-.99 


.40 


.45 


2.09 




0.3 


.26 


-.26 


-.99 


.41 


.45 


2.09 




0.6 


.17 


-.17 


-.99 


.41 


.44 


2.09 




0.9 


-.07 


.07 


-.99 


.42 


.43 


2.09 


BILOG 


0.0 


.26 


-.26 


-.99 


.48 


.52 


2.09 




0.3 


.18 


-.18 


' -.99 


.49 


.50 


2.10 




0.6 


.19 


-.19 


-.99 


.48 


.50 


2.10 




0.9 


-.04 


.04 


-.99 


.48 


.48 


2.10 
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Table 4 

Corralations and mean absolute differences between LOGIST and BILOG estimates and parameters of the 
noncompensatory model 



I: r ugrani 


^ / ft A \ 


r 


r 

a a^ 


















'^bb2 


k 


k 


k 


k 


LOGIST 


0.0 


.31 


-.23 


.99 


.42 


.86 


.82 


.67 


.87 




0.3 


.27 


-.22 


.99 


.41 


.85 


.81 


.67 


.87 




0.6 


.19 


-.14 


.99 


.40 


.84 


.80 


.67 


.86 




0.9 


-.07 


.06 


.99 


.38 


.84 


.80 


.67 


.85 


BILOG 


0.0 


.28 


-.21 


.99 


.42 


.35 


.38 


.67 


.86 




0.3 


.19 


-.17 


.99 


.41 


.33 


.37 


.67 


.86 




0.6 


.19 


-.20 


.99 


.40 


.32 


.35 


.67 


.86 




0.9 


-.40 


-.01 


.99 


.39 


.32 


.35 


.67 


.85 
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Figure Captions 

Figure I . Vectors representing the distance and direction from the origin to 
the point of maximum discrimination for the 40 generated compensatory items. 
Figure 2 . Difficulty (2a) and Discrimination (2b) parameter values for the 40 
matched compensatory and noncompensatory items (p_ _ = 0.0). 
Figure 3 > The item response surface and contour plot of matched compensatory 
and noncompensatory items: 1 (3a), 20 (3b) and 40 (3c). 

Figure 4 . Test information vectors at selected points in the ability plane 
for the compensatory (4a) and noncompensatory (4b) data sets. 
Figure 5 . A plot of the centroids for the LOGIST calibrated compensatory (5a) 
and noncompensatory (5b) response sets for each level of correlation among the 
two-dimensional abilities • 

Figure 6 . A plot of the centroids for the BILOG calibrated compensatory (6a) 
and noncompensatory (6b) response sets for each level of correlation among the 
two-^dimensional abilities • 
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FIGURE 3c 
Item 40 
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